An experience on statistical machine translation between Spanish and the regional languages of Spain
نویسندگان
چکیده
Statistical machine translation systems between Spanish and other regional languages from Spain has become an interest of research during the last decade. However, regional languages are usually characterized by the lack of linguistic resources necessary to build such systems. This paper describes the development of three statistical machine translation systems between Spanish and three other languages: Galician, Catalan and Basque, focusing on the corpora used and the techniques applied in order to improve their performance.
منابع مشابه
The Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language
Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...
متن کاملOvercoming statistical machine translation limitations: error analysis and proposed solutions for the Catalan-Spanish language pair
This work aims to improve an N-gram-based statistical machine translation system between the Catalan and Spanish languages, trained with an aligned Spanish– Catalan parallel corpus consisting of 1.7 million sentences taken from El Periódico M. Farrús (&) M. R. Costa-jussà J. B. Mariño M. Poch A. Hernández C. Henrı́quez J. A. R. Fonollosa TALP Research Center, Department of Signal Theory and Comm...
متن کاملAn Open-Source Shallow-Transfer Machine Translation Engine for the Romance Languages of Spain
We present the current status of development of an open-source shallow-transfer machine translation engine for the Romance languages of Spain (the main ones being Spanish, Catalan and Galician) as part of a larger government-funded project which includes non-Romance languages such as Basque and involving both universities and linguistic technology companies. The machine translation architecture...
متن کاملCatalan-English Statistical Machine Translation without Parallel Corpus: Bridging through Spanish
This paper presents a full experiment on large-vocabulary Catalan-English statistical machine translation without an English-Catalan parallel corpus, in the context of the debates of the European Parliament. For this, we make use of an English-Spanish European Parliament Proceedings parallel corpus and a Spanish-Catalan general newspaper parallel corpus, both of which of more than 30 M words. G...
متن کاملN-best Reordering in Statistical Machine Translation
As statistical machine translation (SMT) systems strive to improve the translation quality they are able to deliver, the word reordering problem is being unveiled as a major problem that must be addressed, whenever these systems are to be improved. While most works published focus their results in corpora involving English, Chinese and Arabic, such a translation problem can also be found within...
متن کامل